changes to address feedback by abarciauskas-bgse · Pull Request #52 · NASA-IMPACT/virtual-stores-feasibility-report

abarciauskas-bgse · 2026-05-09T09:36:10Z

No description provided.

github-actions · 2026-05-09T09:36:46Z

PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-05-29 22:50 UTC

doug-newman-nasa · 2026-05-13T12:07:38Z

Can we add something about the ease of rechunking based on a virtual Zarr store?

Added a point about virtual stores simplifying rechunking or regridding.

abarciauskas-bgse · 2026-05-15T22:24:50Z

@doug-newman-nasa yes thank you for the reminder, I've added a line in in this commit: 078b138.

TomNicholas · 2026-05-19T20:57:49Z

 <img style="height: 150px; margin: 0px auto; display: block" alt="Simple Virtual Zarr Graphic" src="./graphics/simple-virtual-zarr.svg" />

-Virtual stores deliver a single entrypoint to a dataset comprised of many files. For NASA datasets this enables:
+The performance of legacy scientific data formats is poor in a cloud environment. Cloud-optimized formats like Zarr, COG, and cloud-optimized HDF5 address this — but reprocessing or copying the entire NASA archive into these formats is not feasible. Virtual stores bridge this gap: they provide cloud-optimized access to existing archived data without copying it.


reprocessing or copying the entire NASA archive into these formats is not feasible

Why not? In an ideal world we would not need virtual stores, because data providers would write their data into the cloud in a suitable format in the first place. IMO we should be careful not to encourage a narrative that lift-and-shift is totally fine and okay because virtualization exists.

It's my understanding that these archival formats exist for a reason. The inclusion of metadata in each file is a self-describing feature which makes it possible to move and use files on their own. I don't think NASA is ready to shift users away or able to relinquish this archival format requirement. But I'm curious what @doug-newman-nasa would say to that.

makes it possible to move and use files on their own.

If file download is treated as an access pattern rather than the source of truth then that need can still be served from cloud-native stores.

I don't think NASA is ready to shift users away or able to relinquish this archival format requirement.

Yeah I'm not suggesting that NASA is at all ready for this paradigm shift today, but I am suggesting that that is the real ultimate goal, and maybe we should make that explicit.

ah ok I see, I think I can rephrase it with that goal in mind...

@TomNicholas let me know what you think of the rewording in 9adbd0a

Yes I much prefer that framing, thank you. Only nit is that in the final sentence you said that virtual stores avoid the need for "reprocessing", but I think you really mean they avoid they need for "duplicating".

With "reprocessing", I was thinking of the case where data needs to be rechunked into cloud-friendly chunk structures. But I'm fine with "duplicating the underlying data" as I think it covers the bases of either reprocessing to cloud-optimized chunks in the same or a new format.

owenlittlejohns · 2026-05-19T22:56:04Z

 ## Language and ecosystem constraints

-Icechunk is written in Rust with an API in Python. Users and data providers working in other languages (Julia, R, Java, etc.) may face limited or no support for reading and writing Icechunk stores.
+Icechunk is written in Rust with an API in Python. Users and data providers working in other languages (Julia, R, Java, etc.) may face limited or no support for reading and writing Icechunk stores. Rust presents an organizational risk similar to what NASA has experienced with niche languages in other systems: supporting and extending Icechunk long-term would require NASA staff or contractors with Rust expertise, which is not yet widely available in the earth science community. Rust is seeing broader general adoption than some past niche languages, which reduces but does not eliminate this risk.


I think this also captures Doug's feedback nicely. (Without naming a particular software system and its language choices 😉)

Is this the most important limitation to list, though? Feels like the points about chunk shape/size and other data product related considerations might want to get the top billing, and this could be moved down maybe? Or would that just bury the concern?

good point, I actually think it's the least important? I hadn't thought about the order signaling level of significance of the limitation but I think the new order (moving the language limitation to the bottom) reflects the ordering of significance that I would propose.

76ac286

Revised explanation of legacy scientific data formats and their performance in cloud environments. Clarified the role of virtual stores in providing cloud-optimized access without data duplication.

owenlittlejohns

Thanks for the updates @abarciauskas-bgse!

changes to address feedback

0d630cd

Add comment about deck.gl-zarr

dfbf033

owenlittlejohns mentioned this pull request May 11, 2026

Fix bullet points in executive summary. #53

Merged

Rephrase recommendation on chunk structure

fb4e504

abarciauskas-bgse requested a review from sharkinsspatial May 11, 2026 21:49

abarciauskas-bgse marked this pull request as ready for review May 11, 2026 21:51

abarciauskas-bgse self-assigned this May 14, 2026

Add benefit of virtual stores for new products

078b138

Added a point about virtual stores simplifying rechunking or regridding.

TomNicholas reviewed May 19, 2026

View reviewed changes

owenlittlejohns reviewed May 19, 2026

View reviewed changes

abarciauskas-bgse added 2 commits May 29, 2026 09:17

Enhance description of cloud-native archival access

9adbd0a

Revised explanation of legacy scientific data formats and their performance in cloud environments. Clarified the role of virtual stores in providing cloud-optimized access without data duplication.

Move language and ecosystem constraints section to bottom of limitations

76ac286

abarciauskas-bgse mentioned this pull request May 29, 2026

ODD PI 26.3 Objective 1: 🤖 Develop + Maintain the Virtual Zarr Ecosystem NASA-IMPACT/veda-odd#346

Open

4 tasks

Replace reprocessing with duplicating

1aec391

owenlittlejohns approved these changes May 29, 2026

View reviewed changes

abarciauskas-bgse merged commit 15d8e25 into main May 29, 2026
2 checks passed

Conversation

abarciauskas-bgse commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

doug-newman-nasa commented May 13, 2026

Uh oh!

abarciauskas-bgse commented May 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

owenlittlejohns left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented May 9, 2026 •

edited

Loading